Online solution of the average cost Kullback-Leibler optimization problem
نویسندگان
چکیده
We introduce a stochastic approximation method for the solution of a KullbackLeibler optimization problem, which is a generalization of Z-learning introduced by [Todorov, 2007]. A KL-optimization problem is Markov decision process with a finite state space and continuous control space. Because the control cost has a special form involving the Kullback-Leibler divergence, it can be shown that the problem may be solved essentially by finding the largest eigenvector and eigenvalue of a non-negative matrix. The stochastic algorithm presented in this paper may be used to solve this problem. It allows for a sound theoretical analysis and can be shown to be comparable to the power method in terms of convergence speed. It may be used as the basis of a reinforcement learning style algorithm for Markov decision problems.
منابع مشابه
Using Kullback-Leibler distance for performance evaluation of search designs
This paper considers the search problem, introduced by Srivastava cite{Sr}. This is a model discrimination problem. In the context of search linear models, discrimination ability of search designs has been studied by several researchers. Some criteria have been developed to measure this capability, however, they are restricted in a sense of being able to work for searching only one possibl...
متن کاملComparison of Kullback-Leibler, Hellinger and LINEX with Quadratic Loss Function in Bayesian Dynamic Linear Models: Forecasting of Real Price of Oil
In this paper we intend to examine the application of Kullback-Leibler, Hellinger and LINEX loss function in Dynamic Linear Model using the real price of oil for 106 years of data from 1913 to 2018 concerning the asymmetric problem in filtering and forecasting. We use DLM form of the basic Hoteling Model under Quadratic loss function, Kullback-Leibler, Hellinger and LINEX trying to address the ...
متن کاملKL-learning: Online solution of Kullback-Leibler control problems
We introduce a stochastic approximation method for the solution of an ergodic Kullback-Leibler control problem. A Kullback-Leibler control problem is a Markov decision process on a finite state space in which the control cost is proportional to a Kullback-Leibler divergence of the controlled transition probabilities with respect to the uncontrolled transition probabilities. The algorithm discus...
متن کاملOrdinary Differential Equation Methods for Markov Decision Processes and Application to Kullback-Leibler Control Cost
A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a weighting factor ζ that appears in the one-step reward function. For an MDP with d states, the family of value functions {hζ : ζ ∈ R} is the solution to an ODE, d dζh ∗ ζ = V(hζ) where the vector field V : R...
متن کاملKullback-Leibler Divergence Based Distributed Cubature Kalman Filter and Its Application in Cooperative Space Object Tracking
In this paper, a distributed Bayesian filter design was studied for nonlinear dynamics and measurement mapping based on Kullback–Leibler divergence. In a distributed structure, the nonlinear filter becomes a challenging problem, since each sensor cannot access the global measurement likelihood function over the whole network, and some sensors have weak observability of the state. To solve the p...
متن کامل